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DNA barcoding has been widely used in species identification and biodiversity research. A short fragment of the mito- 
chondrial cytochrome c oxidase subunit I (CO/) sequence serves as a DNA bio-barcode. We collected DNA barcodes, based on 
CO/ sequences from 156 species (529 sequences) of fish, insects, and shellfish. We present results on phylogenetic 
relationships to assess biodiversity the in the Korean peninsula. Average GC% contents of the 68 fish species (46.9%), the 59 
shellfish species (38.0%), and the 29 insect species (33.2%) are reported. Using the Kimura 2 parameter in all possible 
pairwise comparisons, the average interspecific distances were compared with the average intraspecific distances in fish 
(3.22 vs. 0.41), insects (2.06 vs. 0.25), and shellfish (3.58 vs. 0.14). Our results confirm that distance-based DNA barcoding 
provides sufficient information to identify and delineate fish, insect, and shellfish species by means of all possible pairwise 
comparisons. These results also confirm that the development of an effective molecular barcode identification system is 
possible. All DNA barcode sequences collected from our study will be useful for the interpretation of species-level 
identification and community-level patterns in fish, insects, and shellfish in Korea, although at the species level, the rate of 
correct identification in a diversified environment might be low. 
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Introduction 

DNA barcoding is a simple and useful step toward 
understanding the ecosystem. It also serves to further our 
interests in biodiversity research [1]. A short standardized 
sequence (400-800 bp) of DNA can be used to distinguish 
individuals of a species. This approach was taken, because 
genetic diversity between species is markedly greater than 
that within species [2] . Numerous computational analysis 
methods and systems have been introduced for this purpose 
[3-5]. The use of this system can provide rapid, accurate, 
cost-effective, and automatable process for species identifi- 
cation. The success rate of each barcoding application varies 
significantly among groups. Moreover, global datasets that 



represent extensive ecosystems are expected to be subjected 
to particular difficulties, especially in groups in which recent 
speciation rates are high and effective population sizes are 
large and reasonably stationary [6]. Several studies of 
species-level identification have covered many groups of 
organisms, including birds, fishes, and various arthropods 
[4, 6-8]. 

In order to use the barcoding system for species iden- 
tification, cytochrome c oxidase subunit I (COI) sequences 
were obtained in this study from 529 sequences, represen- 
ting 156 species from fish, insects, and shellfish in the 
Korean peninsula. 
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Methods 

The first community-level barcoding studies were con- 
ducted in the most diverse terrestrial and marine ecosystems 
in an inland and coastal area of South Korea (include 
reference). We collected samples to obtain an overview of 
the variation patterns for 529 COI sequences among 68 fish 
species, 29 insect species, and 59 shellfish species. Multiple 
specimens were collected for most of the species. Fish and 
shellfish were collected from Yeosu in Jeollanam-do; shell- 
fish were collected from Taean; and insects were collected 
from Chungcheongnam-do, Gangwon-do, Gyeongsangbuk- 
do, and Jeollabuk-do in South Korea. Samples were collected 
using different, technically appropriate methods (Fig. 1, 
Supplementary Table 1) [9]. If possible, the samples were 
obtained from widely distributed places in South Korea. 

Genomic DNA was isolated from samples using the 
Qiagen DNeasy 96 blood and tissue kit (Qiagen, Valencia, 
CA, USA) according to the instructions. DNA fragments of 
target genes were amplified by polymerase chain reaction 
(PCR) with primers for the COI gene (primer sequences: 
LCGT490 GGTCAACAAATCATAAAGATATTGG and HCO 
2198 TAAACTTCAGGGTGACCAAAAAATCA) [10]. PCR 
amplification was performed using Top-Taq PreMix (2x; 
CoreBio, Seoul, Korea) under the following conditions: 
denaturation (1 min at 94°C), annealing at 51°C for 
amplification of the COI gene, and extension (2 min at 72°C) . 
PCR products were purified with the Core-One PCR 
purification kit (CoreBio) , and TA cloning was performed 
using the pGEM-T Easy Vector system (Promega, Madison, 




Fig. 1. Map showing the locations of the cruises and the materials 
collected in this study. Each circle represents one sampling locality, 
and circle size is proportional to the number of samples in our 
study. Google Map was used (http://maps.google.co.kr) [9]. 



WI, USA) by Macrogen Inc. The clones for each marker were 
sequenced with forward (SP6) and reverse (T7) primers 
using an ABI 3730XL sequencer (Applied Biosystems, Foster 
City, CA, USA). The sequences reported in this paper have 
been deposited in GenBank under accession numbers 
HM180413-HM180941. 

To obtain the species information for each operational 
taxonomic unit (OTU) in a phylogenetic tree, a BLAST search 
was performed using the BIASTN program from NCBI [11]. 
A cutoff value for the BLAST result was established as 
follows: query coverage > 90% and identity > 75% for COI. 
The levels of sequence divergence within and between the 
selected species were investigated using the pairwise Kimura 
2 parameter (K2P) distance model [12] . The neighbor-joining 
tree, with gap positions ignored on a pairwise basis, was 
constructed using the neighbor-joining (NJ) method with 
K2P distances in MEGA4 [13]. These distances were 
hierarchically arranged in accordance with intraspecific and 
interspecific species differences within each genus. When 
the sequence dataset consisted of only 2 genera from the 
same family, an intergeneric comparison within the family 
was not performed. 

Results and Discussion 

After BLASTN annotation analyses were conducted, K2P 
distances were compared at different taxonomic levels, 
revealing distinct features in the sequences both within and 
between species. With respect to the COI sequences of the 
156 species represented, the interspecific K2P distances for 
the COI sequences from the 68 fish species, the 59 shellfish 
species, and the 29 insect species ranged from 0% to 45.25% 
(fish, 0% to 40.99%; insects, 0% to 10.34%; shellfish, 0% to 
45.25%) (Fig. 2A), whereas the intraspecific K2P distances 
with >3 sequences ranged from 0% to 0.985% (fish, 0% to 
0.985%; insects, 0.005% to 0.635%; shellfish, 0% to 0.817%) 
(Fig. 2B). The average interspecific distances and average 
intraspecific distances were, respectively, 3.58 and 0.14 in 
shellfish, 3.22 and 0.41 in fish, and 2.06 and 0.25 in insects 
(Table 1). In shellfish, the greatest interspecific K2P diffe- 
rences were 25.57-fold higher than the intraspecific values. 
The overall base composition in each species of fish, insect, 
and shellfish was as follows: T (thymine) ranged from 27.4% 
to 33.7% (highly abundant); G (guanine) ranged from 16.8% 
to 21.5% (not highly abundant) (Table 1) . These findings for 
fish were consistent with previous studies showing that T 
occurred more frequently and G occurred less frequently 
than A (adenine) and C (cytosine) [8] . 

In our polytypic species analysis with more than 3 indivi- 
duals in each species, the average intraspecific difference was 
approximately 0.5%, and the maximum intraspecific diver- 
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Fig. 2. Distribution of interspecific 
Kimura 2 parameter (K2P) distances 
for cytochrome c oxidase subunit I 
(CO/) sequences from the 68 fish 
species, the 59 shellfish species, and 
the 29 insect species. Vertical lines 
show the mean pairwise distance at 
each level. The X- and Y-axes 
represent K2P distance values and the 
percentage of individuals, respectively. 
(A) Interspecific K2P distances. (B) 
Intraspecific K2P distances. 



Table 1. Mean percentage base composition, comparing CO/ sequences and K2P distance among fish, insects, and shellfish 



Group 




Mean of K2P distance 












Base (%) 












No. of species - 


Interspecies 


Intraspecies 




A 






C 






G 






T 




Fish 


68 


3.215 


0.41 


25.9 


± 


0.444 


25.3 


± 


0.588 


21.5 


+ 


0.613 


27.4 


± 


0.525 


Insects 


29 


2.063 


0.25 


31.1 


+ 


0.625 


18.6 


+ 


0.542 


16.8 


+ 


0.348 


33.5 


+ 


0.757 


Shellfish 


59 


3.577 


0.14 


29.2 


+ 


0.743 


18.7 


± 


0.370 


18.4 


+ 


0.340 


33.7 


+ 


0.856 



When multiple individuals were collected for any one species, a single sequence was selected at random. 
CO/, cytochrome c oxidase subunit I; K2P, Kimura 2 parameter. 



gence was only 1.86% (Table 2). The highest overall GC% 
content was found in the 18 species of fish. Lower values 
were found in the 2 species of insects and in the 6 species of 
shellfish (Table 2) . The fish Chelidonichthys spinosus had a high 
GC% content of 50.9%. The mean GC% content of the 18 
barcoded fish species was higher than that of the 6 shellfish 
species (46.9 ± 2.2% vs. 38.0 ± 4.9%) (see also Table 2). 
Sixteen of the 21 species with GC% content > 45% were fish, 
whereas only 1 shellfish species exhibited GC% content 
> 45%. The GC% content can be used in a new approach to 
evaluate animal evolutionary relationships, although the 
relationship between GC% content and the evolutionary 
branching date is not very accurate [14]. Moreover, the 
average divergence of congeneric species pairs was greater 
than that found for intraspecific differences, but 10 species 
in 5 genera had interspecific distances below 0.1% (Table 3) . 
These species included Hexagrammos agrammus/H. otakii, 



Ampedus humeralis/A. subcostatus, Anomala luculenta/A. mong- 
olica, Chlorostoma argyrostoma turbinatum/C. turbinate, and 
Omphalitis rusticus rusticus/O. pfeifferi carpenteri. In addition, 
the NJ tree exhibited shallow interspecific divergence except 
at the first deep divergence (Fig. 3). In fish, several clades 
had a high level of bootstrap support (^97%) (Fig. 3A). 
These clades included Thrysa chefuensis and X adelae, 
Hexagrammos otakii and H. agrammus. In insects, the clades 
that had a high level of bootstrap support (>95%) included 
Fusinus forceps, F. longicaudus, Mytilus galloprovincialis, andM. 
edulis. In shellfish, 2 clades separated out with a high level of 
bootstrap support (>99%) (Fig. 3B). These clades included 
Anomala mongolica and A. luculenta, Ampedus humeralis and A. 
subcostatus (Fig. 3C). 

In conclusion, we obtained DNA barcodes using COI 
sequences from fish, insects, and shellfish. The aims of this 
research were species identification and contribution to 
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Table 2. Maximum 


intraspecific distance and CC% content among fish, insects, 


and shellfish (threshold 


> 0.5%) 


Category 


Species 


Maximum intraspecific distance 


No. of intraspecies 


GC content (%) 


risn 


Parajulis poecilepterus 


1.862 


1 6 


46.4 




Chelidonichthys spinosus 


1.553 


5 


50.9 




Sebastes inermis 


1.521 


22 


AC- 

4b. 5 




Enedrias nebulosus 


1.399 


5 


47.5 




Chirolophis japonicus 


1.387 


5 


A c o 

4b. y 




Raja boesemani 


1.370 


c 

J 


A C~ Q 

4b. y 




Muraenesox cinereus 


1.324 


O 
J 






Takifugu niphobles 


1.309 


1 A 

14 


A "7 1 

4/. 2 




Collichthys lucidus 


1.291 


b 


A 1 Q 

4/.0 




Sebastiscus marmoratus 


1.291 


3 


47.8 




Scyliorhinus torazame 


1.288 


3 


47 




Takifugu xanthopterus 


1.256 


9 


47.6 




Pholis fang/ 


1.232 


5 


47.4 




Nuchequula nuchalis 


1.163 


3 


45.6 




Pseudogobius masago 


1.131 


3 


39.8 




jiudgu japui nca 


I .UD4 


7 


47.2 




Hexagrammos otakii 


0.998 


10 


48.2 




Hexagrammos agrammus 


0.923 


7 


48.2 


Insects 


Lycorma delicatula 


0.953 


3 


34.1 




Amara macronota 


0.896 


3 


32.3 


Shellfish 


Caetice depressus 


1.394 


5 


35.8 




Patelloida saccharina lanx 


1.359 


9 


47.3 




Reishia luteostoma 


1.225 


10 


38.5 




Oratosquilla oratoria 


1.145 


7 


34.8 




Mitrella bicincta 


1.141 


10 


33.7 




Saxidomus purpuratus 


0.525 


3 


37.9 



Table 3. Maximum Kimura 2 parameter (K2P) distances with congeneric species pairs 



Category 


Species pairs 


Maximum K2P distances 


Fish 


Hexagrammos agrammus/Hexagrammos otakii 


0.047 




Hexagrammos otakii/Hexagrammos sp. 


1.389 




Hexagrammos sp. /Hexagrammos agrammus 


0.952 




Sebastes inermis/Sebastes schlegelii 


1.631 


Insects 


Ampedus humeralis/Ampedus subcostatus 


0 




Anomala chamaeleon/Anomala luculenta 


0.124 




Anomala luculentai Anomala mongolica 


0 




Anomala mongolica/Anomala chamaeleon 


0.124 




Apogonia cribricollis/Apogonia cupreoviridis 


0.280 




Carabus jankowskii/Carabus sternbergi 


1.005 




Harpalus discrepans/Harpalus tsushimanus 


0.113 




Maladera japonica/Maladera okamotoi 


0.222 


Shellfish 


Acanthochitona achates/ Acanthochitona defilippi 


1.231 




Acanthochitona defilippi/Acanthochitona rubrolineata 


0.251 




Acanthochitona rubrolineata/Acanthochitona achates 


1.693 




Ceratostoma inornatus/Ceratostoma rorifluum 


0.195 




Chlorostoma argyrostoma turbinatum/Chlorostoma turbinata 


0.002 




Mytilus edulislMytilus galloprovincialis 


0.201 




Notoacmea schrenckii/Notoacmea schrenkii 


1.299 




Omphalius pfeifferi carpenteri/Omphalius rusticus 


1.089 




Omphalius rusticus/Omphalius rusticus rusticus 


1.031 




Omphalius rusticus rusticus/Omphalius pfeifferi carpenteri 


0.050 
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Fig. 3. The neighbor-joining tree of fish, insects, and shellfish based on cytochrome c oxidase subunit I (CO/) sequences. (A) Fish. (B) 
Insects. (C) Shellfish. 



biodiversity research. At the species level, the rate of correct 
identifications might be low in a diversified environment. 
However, DNA barcoded sequences can be used for the 
interpretation of species-level identification and commu- 
nity-level patterns in fish, insects, and shellfish. 



Supplementary materials 

Species identity and collection information for barcoded 
fish, insects, and shellfish in Korea. Supplementary data 
including one table can be found with this article online at 
http://www.genominfo.org/src/sm/gni-10-206-s001.pdf. 
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